fix(cluster-labels): single-flight guard for vocab cache builds by lstein · Pull Request #274 · lstein/PhotoMapAI

lstein · 2026-05-23T00:22:16Z

Summary

On a cold vocab cache, concurrent /cluster_labels and /image_label requests both dispatch get_or_build_vocab_embeddings through asyncio.to_thread. Every caller saw cache_path missing, loaded the encoder, and re-encoded the full vocabulary before the first writer's atomic .tmp → rename landed — observed as 5+ duplicate "Building vocab embeddings cache at …" lines per cold start.
Added a per-encoder threading.Lock registry around the build path in cluster_labels.py, with a re-check of _read_cached_vocab inside the lock so the second waiter picks up the first builder's atomic rename instead of redundantly re-encoding.
Per-spec (rather than global) locks preserve the existing ability for two albums with different encoders to build in parallel; only redundant builds of the same encoder are serialized.

Test plan

pytest tests/backend/test_cluster_labels.py — 38/38 pass, including a new test_concurrent_builds_are_serialized that spawns 4 threads against a gated fake encoder and asserts encode_calls == 1.
ruff check photomap/backend/cluster_labels.py tests/backend/test_cluster_labels.py clean.
Manual: open the app with a cleared ~/.cache/photomap/cluster_vocab/ and trigger several concurrent slide-drawer opens; log should show one "Building vocab embeddings cache" line followed by one "Vocab embeddings cached".

🤖 Generated with Claude Code

Concurrent /cluster_labels and /image_label requests both dispatch get_or_build_vocab_embeddings through asyncio.to_thread, so on a cold cache every caller would re-load the encoder and re-encode the full vocabulary before the first writer's atomic rename landed. Add a per-encoder threading.Lock with a re-check inside the lock so the second waiter picks up the first builder's output instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lstein merged commit 85dfd51 into master May 23, 2026
5 checks passed

lstein deleted the lstein/fix/vocab-build-guard branch May 23, 2026 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cluster-labels): single-flight guard for vocab cache builds#274

fix(cluster-labels): single-flight guard for vocab cache builds#274
lstein merged 1 commit into
masterfrom
lstein/fix/vocab-build-guard

lstein commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lstein commented May 23, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant